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ABSTRACT 

Motivation: Synaptic connections underlie learning and memory in the 
brain and are dynamically formed and eliminated during development 
and in response to stimuli. Quantifying changes in overall density and 
strength of synapses is an important pre-requisite for studying con- 
nectivity and plasticity in these cases or in diseased conditions. 
Unfortunately, most techniques to detect such changes are either 
low-throughput (e.g. electrophysiology), prone to error and difficult 
to automate (e.g. standard electron microscopy) or too coarse (e.g. 
magnetic resonance imaging) to provide accurate and large-scale 
measurements. 

Results: To facilitate high-throughput analyses, we used a 50-year-old 
experimental technique to selectively stain for synapses in electron 
microscopy images, and we developed a machine-learning framework 
to automatically detect synapses in these images. To validate our 
method, we experimentally imaged brain tissue of the somatosensory 
cortex in six mice. We detected thousands of synapses in these 
images and demonstrate the accuracy of our approach using cross- 
validation with manually labeled data and by comparing against exist- 
ing algorithms and against tools that process standard electron mi- 
croscopy images. We also used a semi-supervised algorithm that 
leverages unlabeled data to overcome sample heterogeneity and im- 
prove performance. Our algorithms are highly efficient and scalable 
and are freely available for others to use. 

Availability: Code is available at http://www.cs.cmu.edu/ 

~saketn/detect_synapses/ 

Contact: zivbj@cs.cmu.edu 

1 INTRODUCTION 

The mammalian brain can contain hundreds of millions of 
neurons, each with thousands of specialized connections called 
synapses that enable indirect communication between cells. 
Estimates for the number of synapses in the mammalian brain 
ranges into the trillions. Synapses are essential for the transfer of 
information across neuronal ensembles, and individual synapses 
can be modulated by patterns of incoming neural activity, a phe- 
nomenon thought to underlie learning and memory. 

Changes in the relative strength and number of synapses can 
be regulated by a myriad of factors, including developmental 
age (Cowan et aL, 1984; Huttenlocher and Dabholkar, 1997; 
Stoneham et aL, 2010), sensory experience (Klintsova and 
Greenough, 1999), drug addiction (Van den Oever et aL, 2012), 
estrus cycle (Cooke and Woolley, 2005) and brain pathology. 
For example, in a form of autism linked to mutation of the 
Fragile X gene, spine density in the neocortex is elevated 
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(Hinton et aL, 1991; Pfeiffer and Huber, 2009), a feature that 
has also been observed in mice carrying the same genetic muta- 
tion (Nimchinsky et aL, 2001). Rett syndrome, another neurode- 
velopmental disorder, is characterized by smaller brain size 
caused by deficits in synaptogenesis (Glaze, 2004; Johnston 
et aL, 2005; Na and Monteggia, 2011) that results in fewer 
spines. Similarly, in Alzheimer's disease and other dementias, 
cognitive deficits are associated with reduced synapse density in 
the hippocampus, a brain structure critical for learning (Clare 
et aL, 2010). Understanding how connectivity across neurons can 
change is thus an important question that drives contemporary 
neuroscience research. 

Because synapse distribution is a useful and diagnostic criter- 
ion to evaluate circuit function in learning and disease, there 
have been a variety of methods used to estimate synaptic con- 
nectivity or overall synapse numbers. Electrophysiological meth- 
ods to estimate connectivity and the number of inputs per cell 
can be informative (e.g. Lefort et aL, 2009; Yassin et aL, 2010), 
but these approaches are low- throughput and can typically only 
capture tens or hundreds of connections in reasonable amounts 
of time (Walz, 2007). MRI-based techniques can be used to study 
network function at the level of brain regions or voxels, but they 
do not provide enough spatial resolution to estimate neural con- 
nectivity (Sporns, 2010). Anatomically, synapse densities are 
measured via light-microscopy to identify specialized substruc- 
tures called spines that stud the dendrites of neurons or using 
electron microscopy (EM) to identify ultrastructural features 
that correspond to pre- and post-synaptic elements. Traditional 
approaches have used cumbersome manual detection to count 
synapses in these images (e.g. White et aL, 1986; Knott et aL, 
2002; da Costa et aL, 2009; Morshedi et aL, 2009) and were thus 
constrained to small-scale measurements or required the use of 
specialized transgenic animals (Feng et aL, 2012; Kim et aL, 
2012) limiting their usage for studying plasticity and develop- 
ment in wild-type mice. 

Since the early 1990s, bioimage informatics has emerged as an 
important area in the analysis of biological images (Peng, 2008). 
Imaging datasets are usually much larger than other high- 
throughput biological datasets (e.g. confocal microscopy data 
can range in the hundreds of gigabytes for a single imaging ses- 
sion). Accurately identifying elements of interest (molecules, 
cells, synapses, etc.) within these massive datasets requires the 
development of sophisticated and efficient computational 
models. This often involves a classification-based strategy in 
which a (small) manually labeled training set is used to learn a 
general model that can be used to analyze a larger collection of 
images automatically. The key computational challenges involve 
the reliability and speed at which the analysis is done, as well as 
dealing with the heterogeneity of biological structures and noise 
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present in each image. Electron microscopy data suffer particu- 
larly from these problems and often contain undesired variation 
in intensity and contrast within and across samples and prepar- 
ations. This presents a major computational challenge because 
model parameters learned from one sample may not generalize 
to other samples. Although reconstruction and segmentation of 
conventional EM images has helped answer important questions 
about brain structure and function (Bock et ai, 2011; Denk 
et ai, 2012), these approaches have yet to reach the point of 
full automation (Merchan-Perez et ai, 2009; Jain et ai, 2010; 
Cardona et al, 2012), which has limited their scale and accuracy 
(Kreshuk et ai, 2011; Morales et ai, 2011). 



2 APPROACH 

To aid in identifying and quantifying synapses in EM images, we 
used a 50-year-old experimental technique (Bloom and 
Aghajanian, 1966, 1968) to selectively stain for synapses in any 
animal model (Fig. 1A). Unlike conventional EM, this protocol 
uses ethanolic phosphotungstic acid (EPTA) to pronounce elec- 
tron opacity at synaptic sites by targeting specific proteins in 
contact zones. This technique typically leaves non-synaptic mem- 
branes (e.g. plasma membranes, neurotubules and vesicles) un- 
stained, though considerable variation can exist from sample to 
sample. We dissected brain tissue in the mouse somatosensory 
cortex and performed EM experiments using the EPTA method. 
We used mice from different ages (PI 4, PI 7 and P75) and iso- 
lated the same cortical region in each animal to gauge variance in 
sample quality. 

To demonstrate the advantages of this protocol for large-scale 
synapse identification, we developed a machine-learning frame- 
work to detect synapses in these images in a high- throughput and 
fully-automated manner. We describe a two-step approach. First, 
we use a highly accurate first-pass filtration step to reduce the 
search space of possible synapses by 1-2 orders of magnitude, 
thereby significantly reducing false positives. Second, we train a 
classifier to recognize synapses using texture- and shape-based 
features extracted from small image patches around potential 
objects of interest. We show that our approach is highly accurate 
and that it outperforms correlation-based (Roseman, 2004) tech- 
niques and an automated technique designed to detect synapses 
in conventional EM images (Morales et al, 2011). 

To further improve classification and adjust for varying ex- 
perimental conditions (different sample, different microscope, 
different person performing experiments, etc.), we developed a 
model that classifies synapses in images using both labeled and 
unlabeled data. This type of approach is known as semi-super- 
vised learning (Zhu, 2005), as it combines ideas from supervised 
learning (classification) and unsupervised learning (clustering). 
This technique can be used to build more robust classifiers in 
cases (such as ours) where large imaging datasets can easily be 
collected but where it is much harder to manually annotate these 
images. By integrating unlabeled data in the learning phase, a 
new sample can help fine-tune parameters of a model built from 
a previous sample, which can improve accuracy without requir- 
ing users to manually annotate images in the new sample. 
Indeed, we show that a classifier learned only on labeled data 
from our P14 samples and tested on our P75 samples performs 



worse than a semi-supervised classifier that also leverages un- 
labeled data from P75. 



3 METHODS 

First, we describe the experiments we performed to gather and process 
mouse brain tissue for EM imaging with EPTA, and then we describe our 
machine-learning framework to automatically detect synapses in these 
images. 

3.1 Experimental approach and data collected 

We gathered tissue from the mouse somatosensory cortex because it is a 
well-characterized anatomical area in the cortex (Fox, 2008) and thus 
serves as an important benchmark for the validity of our experimental 
approach. To prepare the tissue for EM analysis, we extracted, fixed 
(using 2.5% glutaraldehyde buffered with phosphate buffered saline) 
and sectioned 50-um thick tissue from wild-type C57bl6 mice at ages 
P14, P17 and P75 with two animals per time point. This range of tissue 
was collected to determine whether our experimental procedure and sub- 
sequent computational analysis remained robust to natural variation in 
tissue samples within and across time points and to variation in the image 
acquisition process. Each mouse whisker is somatotopically mapped to a 
single neocortical column. To ensure that the same cortical region was 
identified in each sample, we applied a mitochondrial stain (cytochrome 
oxidase) to each section to visually identify layer 4 (called the barret) of 
the Dl column/whisker. The barrel was extracted using a dissecting light 
microscope. 

Tissue was prepared for transmission electron microscopy in a series of 
steps. First, the tissue was washed with three changes (5min each) of 
distilled water, followed by an incubation in 0.02% NaOH for lOmin. 
This latter step was absent from the original procedure of Bloom and 
Aghajanian (1966, 1968), but we found that it helped increase the contrast 
of synapses (Fig. 1). Second, the tissue was dehydrated with an ascending 
series of EtOH (25, 50, 70, 80, 90 and 100%), followed by fixation with 
1% phosphotungstic acid (PTA) in 100% EtOH. Third, a small amount, 
7 ul, of 95% ethanol, was added to each 1000 |il of PTA stain used, and 
the PTA was washed from the sample with two changes of 100% ethanol. 
Fourth, propylene oxide was used as a transitional solvent (the first 
change of propylene oxide was on ice), and then the specimen was infil- 
trated with Spurr embedding resin, which was polymerized at 60° C for 
48 h. Finally, lOOnm sections were cut using a diamond knife on an 
ultramicrotome, which were picked up using 50 or 75 mesh copper 
grids. The specimen was observed using a transmission electron micro- 
scope, and images were taken digitally. 

We took roughly 1 30 images per animal (each image is of size 5 um by 
5 urn) covering a total surface area of ~3000 um 2 per sample and with a 
total of six samples (see Supplementary Information for additional details 
about image acquisition). Images were taken from a single plane, and 
therefore no correction was necessary to account for double-counting 
synapses across serially sectioned images. There was significant sample- 
to-sample variability in the images, but synapses were typically dark with 
most other biological structures washed away (Fig. 1A). In total, we 
collected >800 images each containing between 0 and 12 synapses. The 
raw images were provided as input to the machine-learning algorithms 
described later in the text. 

3.2 Strategies for effectively detecting synapses 

As described earlier in the text, electron microscopy images are inherently 
noisy owing to variations in the samples (e.g. different age), in the manual 
processing steps and in the image acquisition process. To overcome these 
issues, we developed a pipeline that uses object segmentation, background 
information, normalization and alignment to obtain a better feature 
set that can be used for effective classification. 
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Fig. 1. Experimental technique and training data collected. (A) Comparison of conventional and EPTA-stained EM images shows a marked difference in 
clarity of synaptic structures, albeit high sample-to-sample variability. (B) Subset of positive and negative examples taken from multiple images across all 
samples. The original examples exhibit high variance and noise; normalization and alignment reduces this heterogeneity 



1. Reducing the search space of possible synapses via 
segmentation One popular technique to search for objects in an 
image is the sliding window approach (Szeliski, 2010). In this approach, 
non-overlapping windows of a pre-defined size are slid across the image, 
and each window is classified as synapse-containing (positive) or not 
(negative). One drawback to this technique is that it can double-count 
synapses that lie in multiple windows or miss synapses that lie adjacent to 
each other. A more thorough approach is to use overlapping windows, 
but this greatly increases computational complexity when using large 
images, and subsequent post-processing steps still need to account for 
double-counting synapses. Further, this technique creates a large imbal- 
ance of negative-to-positive examples (windows), which may lead to 
many false positive predictions. 

To circumvent these problems, instead of using sliding windows, we 
apply a filtration step to reduce the number of windows to test by lever- 
aging the fact that the experimental procedure outlined earlier in the text 
is designed specifically to stain synapses (there may be other biological 
structures, such as mitochondria and membranes, that also appear in the 
image, but rarely are there synapses that do not appear as dark). To avoid 
the challenge of selecting thresholds to segment the image (which may not 
generalize across samples owing to differences in intensity and enhance- 
ment), we adopt the contrast-limiting adaptive histogram equalization 
algorithm (Zuiderveld, 1994). This method enhances the contrast of 
each window T in the image to approximately match a flattened histo- 
gram by mapping each pixel value v e [0, 255] to its value in the cumu- 
lative distribution f(v) computed in T: 

fly) = if>st(0 =f(v - l) + -hist(v), (1) 

i=0 

where hist(-) represents the original histogram of the image and n is the 
number of pixels in the window. We also limit the enhancement by clip- 
ping the histogram at a scalar value of 0.20 before enhancement. The 
window is then rescaled to [0,255]. The equalization is performed in each 
local window of the image, and then the windows are combined using 



bilinear interpolation to eliminate artificially induced boundaries. By only 
considering small windows of the image, this technique prevents the over- 
amplification of noise or artifacts that only appear in localized regions of 
the image. 

Next, we binarize the equalized image using a single, sample-independ- 
ent threshold, which was determined manually to be 10% (i.e. only the 
top 10% of pixels values are kept). The final segmentation is produced by 
computing connected components (segments) in the binary image. 
Compared with a non-overlapping sliding window approach that 
would produce roughly 300 windows to test per 1016 x 1024- sized pixel 
image on average, our segmentation produces 25-35 candidate segments 
per image and an overall ratio of negative-to-positive examples of 9:1. We 
also allow for optional filtration of segments that are too small or too 
large to be synapses, as defined by the user. This step is only designed to 
produce segments and not to normalize the image, which we do separ- 
ately later in the text. 

To validate that the segmentation step preserves synapses, we looked 
at two samples (PI 4 and P75) and manually checked what percentage of 
the first 100 synapses encountered were correctly segmented. We found 
that only 1% (1 in each sample) of the synapses were lost, and these were 
always due to two synapses touching each other, which caused them to be 
merged into a single segment. 

2. Using background cues to augment synapse identification One 
key indicator to decide whether a candidate segment is a synapse is the 
physical context in which it lies. We consider a local 75 x 75-pixel window 
around the centroid of each segment to capture information about the 
object and the neighborhood surrounding the object. This is important 
because elongated synapse-like segments may also appear within mito- 
chondria, but such segments are always surrounded by an oval-like con- 
tour marking the boundary of the mitochondria, which can be a useful 
cue for classification. Similarly, dark circular spots in the nucleus may 
also be stained (see Fig. 1A), but their neighborhoods typically contain 
other such spots, which also serve as strong discriminators. These are not 
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steadfast rules (synapses can certainly lie adjacent to mitochondria), but 
local neighborhoods are nonetheless a strong visual cue used by EM 
experts when annotating images (Arbelaez et al., 2011). 

3. Handling experimental variation across samples The aforemen- 
tioned technique produces a set of windows (in which each candidate 
segment is embedded), but the actual pixel values within these windows 
may vary significantly from sample-to-sample and image-to-image. 
Figure IB exemplifies the discrepancy that can exist across both negative 
and positive examples in the original images. These issues may be caused 
by a variety of factors (e.g. consistency of chemical reactions during 
sample preparation, age of the sample, skill of the experimenter, differ- 
ential illumination in the microscope, etc.) that are extremely difficult to 
overcome experimentally and thus must be accounted for computation- 
ally. To reduce the effect of these differences, we normalize each candi- 
date window W by computing (Arbelaez et al., 2011): W = 
where W xy is the current value at pixel (x,y) and fi and a are the mean 
and standard deviation of the pixel intensities in W. Figure IB shows that 
this significantly reduces the variance across windows, which helps fea- 
tures dependent on pixel intensities to be compared in an equal setting. 

3. Adjusting for synapse heterogeneity Synapses in EM images may 
be angled at any 2D orientation, which may add undesirable variation in 
training examples. To create a more invariant set of synapses, we applied 
the generalized Hough transformation (Duda and Hart, 1972) to auto- 
matically rotate the segment (within the candidate window) such that its 
major axis points vertically. Briefly, this is done by computing the Hough 
transform matrix H, where entry ij of H corresponds to the number of 
points in the segment that fall along a line parameterized in polar coord- 
inates as r t = xcos(^) + j>sin(0,-). We find the element (r*,0*) in H that 
corresponds to the (peak) line for which the most segment points lie. The 
corresponding 9* is used to compute the angle of rotation. We then 
cropped the image to 60 x 60 pixels to remove the effects of the interpol- 
ation (this size is still large enough to fit almost all synapses). The out- 
come (Fig. IB) shows much greater uniformity for both synapses and 
non-synaptic structures compared with the original images. 

3.3 Supervised learning framework and features used to 
detect synapses 

After the processing steps aforementioned, each image is reduced to a set 
of candidate segments defined by a (normalized and aligned) square 
window around the centroid of the segment. Next, we build an accurate 
and robust classifier to discriminate between positive (synapses) and 
negative candidates using texture- and shape-based features. 

Texture is a common cue used by humans when manually segmenting 
structures from electron micrographs, and its use has become popular in 
many image processing tasks today (Arbelaez et al., 2011; Varma and 
Zisserman, 2003). In their seminal article, Leung and Malik (2001) 
defined texture by convolving an image with a bank of 48 Gaussian filters 
and used the filter responses at each image location to define a 'texton'. 
Textons were clustered and used to represent and classify images in a 
reduced dimension. Recently, more effective filter banks have been pro- 
posed to represent texture based on modeling joint distributions of inten- 
sity values over small and compact neighborhoods of the image (as 
opposed to the entire image), an approach we also adopt here. 

The maximum response (MR8) is one such technique that is derived 
from a set of 38 filters: 6 orientations x 3 scales x 2 oriented filters + 2 
isotropic filters. By recording only the maximum response across orien- 
tations, the number of responses is reduced to 8 (Varma and Zisserman, 
2003). Each pixel is now represented as an 8D vector of responses at its 
(x, y) location. For each dimension d, we compute a normalized histo- 
gram with 17 bins (larger bin sizes yielded marginal gain in performance 
but increased computational complexity) composed of responses for d 
over all pixels in the window. Thus, the texture of the window is 



represented as a 8 x 17-sized vector in W\ We used the default parameters 
for MR8 (Varma and Zisserman, 2003). 

Synapses also have a characteristic shape (typically long and elon- 
gated) that we also attempt to capture by extracting the following 
10 shape descriptors for each segment. These features operate on the 
binary segment only (they ignore the intensity values of the pixels) and 
therefore contribute different information than texture alone. 

(1) Area: the number of pixels in the segment. 

(2) Perimeter: the number of pixels in the boundary of the segment. 

(3) Major axis: the number of pixels constituting the major axis of the 
ellipse that has the same normalized second central moments (co- 
variance) as the segment. 

(4) Minor axis: same as above but for the minor axis of the ellipse. 

(5) Orientation: the angle between the x-axis and the major axis of the 
ellipse. 

(6) Eccentricity: the ratio of the distance between the foci of the ellipse 
and its major axis length. 

(7) Convex area: the number of pixels in the convex hull of the 
segment. 

(8) Solidity: the proportion of pixels in the convex hull that are also in 
the segment. 

(9) Diameter: the diameter of the circle with the same area as the 
segment. 

(10) Extent: the ratio of pixels in the segment to pixels in the smallest 
bounding box of the segment. 

The final feature we use is the histogram of oriented gradients (HoG) 
descriptor proposed by Dalai and Triggs (2005). We used nine orientation 
bins, cell and block sizes of 10 x 10 and 6x6, respectively, and a value of 
0.2 for clipping the L2-norm. Intuitively, this 334D feature describes the 
appearance of an object by concatenating the distributions of intensity 
gradients (edge directions) in different subregions of the window. This 
descriptor has been shown to outperform other popular feature sets 
(e.g. PCA-SIFT and generalized Haar wavelets) in a variety of object 
detection tasks (Dalai and Triggs, 2005; Skibbe et al, 2011). In total, 
each window is represented by a 480D feature vector in R n . All features 
are scaled to lie in [0, 1]. 

These features were then used to build a support vector machine (SVM) 
classifier (Chang and Lin, 2011) using a radial basis function kernel, and 
we performed a grid search to optimize parameters of the model. We also 
learned a Random Forest (Breiman, 2001) classifier and an AdaBoost 
ensemble model (Freund and Schapire, 1995) using 100 trees/learners, re- 
spectively. An overview of the supervised algorithm is shown in Figure 2. 

3.4 Semi-supervised learning 

The supervised algorithm described earlier in the text only uses labeled 
data to train a classifier. We now show how unlabeled data can be used to 
further improve the classifier using co-training. The co-training algorithm 
(Blum and Mitchell, 1998) assumes that features can be split into two 
different and fairly independent sets (in our case, texture and shape). 
Blum and Mitchell (1998) proved that for a classification problem with 
two such feature sets, the target concept can be learned based on a few 
labeled and many unlabeled examples, provided that the views are com- 
patible and uncorrected. The compatibility condition requires that all 
examples are identically labeled by the two classifiers (one for each of 
the feature sets). The uncorrelated condition means that for any pair of 
features, the two sets of features are independent, given the label. In real 
applications, these two conditions are rarely satisfied simultaneously. For 
example, in our task, compatibility may be hindered due to noise and 
imaging artifacts. Still, co-training has proven useful in several real-world 
applications (Zhu, 2005). 
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Algorithm 1 Det ect Synapses (I , Model ,r) 



Ihisteq = HistogramEqual(I) 
S = FindCandidateSegments(I histeq ) 
S = FilterSegments(S, min_size, max_size) 
for all remaining segments Si G S do 
W = Normal izeWindow( I ,Si) 
W = VerticalHoughTransf orm(W) 
Ftext = ExtractTextureFeatures(W) 



r hog 



ExtractHoGFeatures (W) 



Fshape = ExtractShapeFeatures(Si) 



if Model (F text , F hog , F shape ) > r then 

Predict Si as synapse 
else 

Predict Si as not a synapse 
end if 
end for 

Supervised algorithm to detect synapses in an EM image. 



Fig. 2. Main steps of the supervised synapse-detection algorithm and corresponding pseudocode. High-contrast objects are automatically segmented 
using histogram equalization. Shape-based features are extracted from these objects, as well as texture-based features from a small window surrounding 
the object. A model is learned from these two features types and is used to classify synapses 



In a co-training algorithm, two separate classifiers are trained with the 
labeled data using the two subfeature sets, respectively. Each classifier is 
applied to the unlabeled examples, and it outputs a list of positive (pu- 
tative synapses) and negative examples, ranked by confidence of assign- 
ments. We consider all the predictions in each list above a given threshold 
and any example for which both independent classifiers agree on its label 
is added to the labeled dataset. We iterate this process once and then 
retrain a final single classifier using all features from the original set of 
labeled examples and the new (predicted) set of labeled examples obtained 
during this iterative process. 

In addition to our attempts to normalize each image, this approach 
provides another way to account for variability in never-before-seen sam- 
ples without requiring explicit annotation, which can be cumbersome to 
obtain in practice. 

3.5 Testing and comparing with other approaches 

For experiments using supervised learning, we manually labeled 1 1 % (59) 
of the 520 images from P14 and P75 distributed equally across each 
sample. To do so, we performed the first-pass segmentation described 
earlier in the text and labeled the resulting windows (segments) as positive 
or negative. A total of 230 synapses were identified (each showing a clear 
post-synaptic density and elongated shape) along with 2062 negative ex- 
amples. The MR8, HoG and shape descriptors were extracted from each 
example and stored as a 480D feature vector. We performed 10-fold 
cross-validation and report precision, recall and area under the ROC 
and precision-recall curves. The confidence in each prediction was mea- 
sured based on the distance from the test vector to the decision boundary 
in feature space (for SVM) and based on the proportion of tree agree- 
ments for the Random Forest classifier. For supervised learning, we did 
not use any unlabeled data. 

For the semi- supervised approach, we selected all the labeled images 
from one sample (sample A, e.g. P14) and trained two classifiers using 
texture and shape features, respectively. Each classifier was then applied 
to the ~ 90% of unlabeled images in sample B (P75), and all predictions 



in either the positive or negative set in which both classifiers agreed (i.e. 
both predicted the same label with confidence above a given threshold) 
were added to the training set. A new single classifier C was then built 
using the known labeled examples from sample A as well as the high- 
confidence examples predicted by the co -trained classifiers. We also 
learned a baseline classifier C that was only trained on the known 
labels from sample A. Both classifiers were then tested for accuracy on 
the true labeled examples from sample B. 

We compared our supervised approach against a correlation-based 
technique (Roseman, 2004) that classifies a test window W as synapse- 
containing with respect to training set T if max, corr( W, 7/) > t; i.e. if the 
correlation coefficient between W and any positive example in the train- 
ing set is > t, where t e [0, 1]. We also compare against Espina (Morales 
et al., 2011), a tool designed to detect synapses in conventional EM 
images (see Supplementary Information). 

4 RESULTS AND DISCUSSION 

4.1 Validating against conventional EM 

One potential concern with using the EPTA method is that some 
synapses may be washed away alongside other non- synaptic 
structures. To validate the correctness of our experimental pro- 
cedure, we tested whether EPTA-stained images preserve roughly 
the same density of synapses that appear in conventional EM 
images of the same region. First, we isolated tissue corresponding 
to the D2 barrel of the mouse somatosensory cortex at P75. In 
one hemisphere, we performed standard EM chemistry and in 
the other hemisphere, we applied our EPTA stain. We then asked 
an expert EM biologist (J.S.) and an expert neuro scientist 
(A.L.B.) to manually annotate 26 conventional EM images for 
high- and medium-confidence synapses, and we compared dens- 
ity ratios versus our automated algorithm on the EPTA-stained 
tissue. 
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Table 1. EPTA-stained EM images preserve synapse density 



Technique 


Number of images 


Number of synapses 


Ratio 


Time 


Expert 1 (Standard EM, manual counting) 








0.77 min/im age 


High confidence only 


26 


71 


2.73 ±1.66 




High and medium confidence 


26 


88 


3.38 ±1.90 




Expert 2 (Standard EM, manual counting) 








0.58 mins/image 


High confidence only 


24 


65 


2.71 ±1.55 




High and medium confidence 


24 


89 


3.71 ±1.85 




Algorithm (EPTA EM, automatic counting) 








0.06 mins/image 


Threshold = 0.7 


275 


745 


2.71 ±2.00 




Threshold = 0.5 (default) 


275 


893 


3.25 ±2.19 





Note: We manually counted high and medium confidence synapses in 24—26 conventional EM images and used our algorithm to automatically count synapses in 275 EPTA- 
stained EM images of the same region. Different classifier thresholds allow us to closely approximate the true number of synapses. 



EPTA appears to conserve synaptic density compared with 
conventional EM (Table 1). In particular, the two experts 
found an average of 3.55 high and medium-confidence synapses 
per image within the conventional EM data versus 3.25 using the 
EPTA-stained images (an average difference of only 8%). We 
can even more closely approximate the number of high-confi- 
dence-only synapses by increasing the stringency of the classifier 
(a difference of <1 %). Because synaptic density may slightly vary 
according to pial depth of the specimen even within D2, some 
variation may be expected across hemispheres. 

If the EPTA stain were selectively staining for only asymmetric 
(excitatory) synapses and not symmetric (inhibitory) synapses, 
then we would expect a roughly 20% difference (Micheva and 
Beaulieu, 1995) between the number of synapses counted using 
EPTA and conventional EM. However, the close correspondence 
between the two methods suggests that we may be capturing both. 
This further demonstrates the validity of the EPTA stain as syn- 
apse-preserving (Bloom and Aghajanian, 1966) and provides a 
way of choosing an appropriate threshold for classification. 

4.2 Detecting synapses using supervised and 
semi-supervised learning 

As aforementioned, we used texture- and shape-based features to 
train several different classifiers (SVM, Random Forest and 
AdaBoost). We next compared the performance of these classi- 
fiers as well as a template-matching algorithm previously sug- 
gested for this task (Roseman, 2004) (Fig. 3). The SVM trained 
on both texture and shape features (MR8, HoG and Shape) 
performed best with an accuracy ranging from 54.8 to 81.3% 
on the positive set and 99.6 to 96.8% on the negative set depend- 
ing on the classifier threshold used. Specifically, the default 
threshold of 0.5 allowed us to recall 67.8% of the true synapses 
with a precision of 83.3% (Fig. 3B). This significantly outper- 
formed all other classifiers, as well as the correlation-based 
template-matching algorithm. These results suggest that the 
EPTA stain facilitates the high-throughput and automated 
detection of synapses compared with conventional EM. Such a 
performance would enable the large-scale use of this method 
to study experience-dependent plasticity in the brain and to 
detect abnormal changes in synaptic density owing to neuro- 
logical disorders. 
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Fig. 3. Cross-validation accuracy of all methods. AUC of the ROC 
curve (top) and precision-recall curve (bottom) show that the SVM 
trained using both texture and shape descriptors outperforms all other 
methods 
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The shape descriptors by themselves worked well for identify- 
ing positive examples but predicted many false positives owing to 
similar contours in, for example, nucleus regions, which are 
better discriminated using texture. The HoG features especially 
benefited from the vertical alignment of the synapses (a 7-10% 
decrease in false negatives with similar false positives comparing 
classifiers trained with and without alignment). 

To further validate our classifier with respect to human anno- 
tation, we took 30 unlabeled images from our P14 sample and 
asked an independent party (a technician in the Barth lab famil- 
iar with the EPTA protocol) to manually annotate all high-con- 
fidence synapses. We then applied our supervised classifier to 
these images using the default classifier threshold of 0.5 and ob- 
tained an accuracy of 87.25% (89/102) on the set of positive 
predictions (66.6% recall) and 96.28% (1164/1209) on the nega- 
tive predictions. These percentages are similar to those obtained 
using cross-validation. 

Next, we applied semi-supervised learning to detect synapses 
in new samples for which training data is not available (Table 2). 
Here, we trained a single classifier C on one sample (either P14 or 
P75) and used co-training to learn another classifier C using 
unlabeled examples from the other sample. We found that the 
accuracy of C increases amongst the positive class by 8-12% and 
increases the AUC by 1-4% compared with the baseline classifier 
C (see Section 3). We varied the number of unlabeled examples to 
transfer into C and found that using the top 0.5% of positive 
examples (with a corresponding number of negative examples to 
maintain the same ratio of positive-to-negative) improved accur- 
acy but that including the top 1.5% led to loss in performance. 

To explore how semi-supervised learning can account for the 
variance between mice at the some age, we trained on positive 
and negative examples from one P75 mouse and used the 
examples from the other P75 mouse to test (and vice versa). 
The baseline without semi-supervised learning had an average 
precision-recall AUC of 68.89% (78.81 and 58.97% individually) 
and an average ROC AUC of 95.80% (98.48 and 93.12% indi- 
vidually). Then, we used co- training to train a new classifier and 



Table 2. Semi-supervised learning boosts cross-sample accuracy 



Train/Test 


Co -training 


Accuracy 




AUC 








Positive 


Negative 


Prec-recall 


ROC 






(%) 


(%) 


(%) 


(%) 


P75/P14 


No 


66.36 


98.20 


73.65 


96.91 


P75/P14 


Yes (0.5%) 


72.90 


98.60 


75.75 


97.14 


P75/P14 


Yes (1.5%) 


74.77 


96.91 


73.06 


96.65 


P14/P75 


No 


48.78 


98.96 


60.50 


90.38 


P14/P75 


Yes (0.5%) 


60.16 


98.21 


64.23 


92.89 


P14/P75 


Yes (1.5%) 


60.98 


97.55 


63.80 


92.83 



Note: We trained a classifier using images from one sample and tested it on images 
from another sample (first column). This was done without co-training as a baseline 
and with co-training using two thresholds for selecting the number of unlabeled 
examples to include (second column). The third and fourth columns show accuracy 
on the positive and negative classes, and the last two columns show the precision- 
recall and ROC AUCs, respectively. Bold cells indicate best performance. 



improved the average precision-recall AUC to 73.06% (82.45 
and 63.66%) and the ROC AUC to 97.15% (98.78 and 
95.52%). Both these results demonstrate the power of using 
semi-supervised learning on unlabeled data, which is often plen- 
tiful but ignored in bioimage applications. 

Our approach is also scalable and efficient: using unoptimized 
MATLAB code, we can process a single image in 3.4 s on a 
standard desktop machine using a single processor. 

4.3 Development changes in the mouse barrel cortex 

Next, we used our method to study developmental changes in 
synapse density and size in a defined area of the mouse brain, the 
representation of a single specified (Dl) whisker in layer 4 of 
somatosensory cortex. We performed additional experiments 
and imaged the Dl barrel in two PI 7 mice following the same 




500 1000 1500 2000 2500 3000 3500 4000 

Size of synapse perimeter (nanometers) 



4500 



P-value < 0.01 for all 
three; pairwise tests 
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Size of synapse perimeter (nanometers) 

Fig. 4. Developmental changes and log-normal distribution of synaptic 
strength. (A) Histogram of synapse sizes (as measured by length of the 
perimeter of the synapse) from two P14 (blue), two P17 (red) and two P75 
(green) samples. There is significant growth in number of synapses from 
P14 to P17, followed by pruning by adulthood (P75). Further, all three 
distributions closely approximate a log-normal curve, which matches elec- 
trophysiological data on synaptic connection strengths (Song et al., 2005). 
(B) Cumulative distribution of synapse sizes suggest a shift in synapse 
strength over time (P<0.01, Kolmogorov-Smirnov test) 
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procedure as outlined in Section 3. We used our classifier to 
count synapses in images from both time points and, consistent 
with previous findings (De Felipe et al, 1997), synapse density 
appeared to rapidly increase from P14 to P17 (2.73 synapses per 
image at P14 versus 4.25 at PI 7; Fig. 4A). Further, we found that 
synapse density decreases in adults to about the same level as P14 
(2.78 synapses per image at P75), which is also consistent with 
prior work on pruning (White et al, 1997). Thus, our pipeline 
can be used to better understand the rates of developmental 
processes and can be used to define the precise timelines of 
their occurrence. 

Finally, to demonstrate the utility of the EPTA stain beyond 
simply detecting synapses, we used shape features to characterize 
the strength of each synapse based on its well-established correl- 
ation with spine size (Lee et al, 2012). Modulation in synaptic 
strength is an important facet of neuroplasticity and develop- 
ment (Wen and Barth, 2011) with larger contact areas likely to 
transmit more current to the post-synaptic neuron. We used our 
classifier-predicted synapses in all samples (PI 4, P17 and P75) 
and plotted the distribution of the perimeter of all synapses de- 
tected (larger perimeter larger size stronger synapse). 
Structurally similar synapses may appear to have different sizes 
when projected onto 2D because the image cross-sections may be 
taken at any angle. Exactly correcting for this bias would involve 
3D reconstruction of synapses from stacks of EM images, which 
is a separate and interesting problem in its own right (Jain et al, 
2010; Kreshuk et al, 2011; Merchan-Perez et al, 2009; Morales 
et al, 2011). 

We found that the distribution of synapse size at all ages fits a 
log-normal distribution (P<0.01 by Shapiro-Wilk and 
Anderson-Darling tests; Fig. 4A), which is consistent with pre- 
vious electrophysiological data (Song et al, 2005). We also 
observed that there was a significant shift in the distribution of 
synapse size toward the emergence of small synapses during the 
period of rapid increase in synapse density at PI 7, without a 
concomitant loss of large synapses within this short developmen- 
tal time window (Fig. 4B). These data suggest that developmen- 
tal processes may preferentially enable the addition of new 
synapses without necessarily augmenting the size of already- 
existing synapses. The data analysis pipeline that we have estab- 
lished can thus be used to generate and test specific hypotheses 
about synapse growth and maturation in the developing neocor- 
tex in a high-throughput and statistically robust manner. 

5 DISCUSSION AND CONCLUSIONS 

Synaptic density and strength are dynamically modulated in the 
brain and are important facets for understanding neural circuitry 
and function. Experience-dependent plasticity, circuit develop- 
ment and neuropathologies have all been linked to changes in 
the number and strength of synapses in the brain, and thus their 
characterization is a useful parameter to facilitate our under- 
standing of network function. Although current experimental 
techniques for studying these conditions have many advantages 
[e.g. electrophysiology provides a wealth of data not extractable 
from images, and conventional EM can be used to resolve syn- 
apses to extract pre- and post-synaptic partners and thus poten- 
tially allow for reconstruction of microcircuits (Denk et al, 
2012)], several interesting questions can also be answered by 



sampling from brain regions of interest over several conditions 
or time points and generating high-confidence large-scale statis- 
tics of synaptic structures present. But to do so requires both 
clarity in the data and robust algorithms to explore this data. 

We used an old experimental technique for selectively staining 
synapses in EM images. This technique does not require specia- 
lized animal models for enhancing synapses, and we validated it 
on new tissue of the mouse somatosensory cortex and against 
conventional EM. We collected >800 images from three time 
points and developed a fully automated and high- throughput ma- 
chine-learning framework that detected thousands of synapses in 
these images. We also used semi-supervised learning to learn 
models that can adapt to variability in new samples by leveraging 
unlabeled data. Such an approach is suitable for several other bio- 
image classification problems that also face issues of sample het- 
erogeneity. Our approach is general and scalable enough to 
handle large datasets and is freely available for others to use. 

For future work, it would be a great interest to perform im- 
munocytochemistry using EPTA against GABA and AMPA re- 
ceptors to separately classify symmetric and asymmetric 
synapses. Accuracy could also be improved by detecting synapses 
in 3D, though this would require many more images and more 
sophisticated computational techniques that can automatically 
segment, align and reconstruct synapses across serial sections. 
The 2D sampling-based strategies (e.g. considering sections sepa- 
rated by lOum) may certainly miss synapses, but if the same 
procedure is applied to each sample at each time point, the rela- 
tive number of synapses per image or unit area can still be com- 
pared in a fair manner. Further, to remove some biases in 2D 
analysis caused by larger synapses, previous works have pro- 
posed formulas to adjust counts based on the average size of 
synaptic profiles observed in the sample (Coggeshall and 
Lekan, 1996; Mayhew, 1996; Huttenlocher and Dabholkar, 
1997), which we can also use. 
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